Automatic Error Analysis for Morphologically Rich Languages
نویسندگان
چکیده
This paper presents AMEANA, an opensource tool for error analysis for natural language processing tasks targeting morphologically rich languages. Unlike standard evaluation metrics such as BLEU or WER, AMEANA automatically provides a detailed error analysis that can help researchers and developers better understand the strengths and weaknesses of their systems. AMEANA is easily adaptable to any language provided the existence of a morphological analyzer. In this paper, we focus on usability in the context of Machine Translation (MT) and demonstrate it specifically for English-to-Arabic MT.
منابع مشابه
Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora
We present a method for automatically learning inflectional classes and associated lemmas from morphologically annotated corpora. The method consists of a core languageindependent algorithm, which can be optimized for specific languages. The method is demonstrated on Egyptian Arabic and German, two morphologically rich languages. Our best method for Egyptian Arabic provides an error reduction o...
متن کاملPE2rr Corpus: Manual Error Annotation of Automatically Pre-annotated MT Post-edits
We present a freely available corpus containing source language texts from different domains along with their automatically generated translations into several distinct morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. We believe that the corpus will be useful for many different applications. The main advantage of the approa...
متن کاملSpecial Techniques for Constituent Parsing of Morphologically Rich Languages
We introduce three techniques for improving constituent parsing for morphologically rich languages. We propose a novel approach to automatically find an optimal preterminal set by clustering morphological feature values and we conduct experiments with enhanced lexical models and feature engineering for rerankers. These techniques are specially designed for morphologically rich languages (but th...
متن کاملError Analysis and Improving Speech Recognition for Latvian Language
Developing a large vocabulary automatic speech recognition system is a very difficult task, due to the high variations in domain and acoustic variability. This task is even more difficult for the Latvian language, which is very rich morphologically and in which one word can have dozens of surface forms. Although there is some research on speech recognition for Latvian, Latvian ASR remains behin...
متن کاملArabic Language Modeling with Finite State Transducers
In morphologically rich languages such as Arabic, the abundance of word forms resulting from increased morpheme combinations is significantly greater than for languages with fewer inflected forms (Kirchhoff et al., 2006). This exacerbates the out-of-vocabulary (OOV) problem. Test set words are more likely to be unknown, limiting the effectiveness of the model. The goal of this study is to use t...
متن کامل